This report shows the re-analysis workflow of publicly available scRNA-seq data collected from the cells of human atherosclerotic lesion or adjacent artery tissue.
| Authors | Year | Sample details | Number of samples | Sample groups | GSE ID | Publication |
|---|---|---|---|---|---|---|
| Alsaigh T. et al. | 2020 | carotid artery | 6 | Patient-matched calcified atherosclerotic plaques (n=3), and proximal adjacent tissues (n=3) | GSE159677 | https://pubmed.ncbi.nlm.nih.gov/36224302/ |
| Pan H. et al. | 2020 | carotid artery | 3 | Atherosclerotic carotid arteries from patients undergoing endarterectomy (n=3) | GSE155512 | https://pubmed.ncbi.nlm.nih.gov/32962412/ |
| Wirka R.C. et al. | 2019 | coronary artery | 8 | Atherosclerotic coronary arteries from explanted hearts of patients undergoing heart transplantation (n=4), with replicates | GSE131778 | https://pubmed.ncbi.nlm.nih.gov/31359001/ |
Analysis using Seurat (version 4.9.9.9091).
Clinical characteristics of samples and patients
Technical characteristics of DNA libraries pooled by sample while processing via CellRanger
nFeature_RNA)nCount_RNA)percent_largest_gene)percent_mito)percent_ribo)percent_hb)percent_malat1)novelty_score)Mito-ribo ratio (MRR) is rather useful estimation of cell quality. Barcodes with higher MRR (more mitochondrial and less ribosomal genes expressed) usually appear to be diying or apoptotic cells, or cell debris.
Here are dashed lines denote approximate thresholds of the relevant parameters picked “by eye”: - nGenes = [200; 4500], - MRR ≤ 0.5.
Another useful characteristics is an estimation of DNA library complexity called a novelty score (NS). Sometimes one can detect contamination with low complexity cell types like red blood cells via this metric. Usually, it is expected to be above 0.80.
What are the top 10 mostly expressed gene in every sample (library)?
A high percentage of MALAT1 and MT genes point to both bad quality (dying, apoptotic) cells or contamination by ambient RNA.
As we see above, MALAT1 is highly presented mostly in the cells with high MRR. But also highly enriched in the cells with MRR <0.5 but with not so many (<2000) genes detected per cell. Hence, the presence of ambient RNA is expected due to possibly abundant presence of debris in atherosclerotic lesions.
Let’s start by correcting for any potential RNA contamination from the surroundings, and afterward, we’ll implement quality control filtering.
The presence of cell-free mRNA contamination within the input solution is commonly referred to as “the soup,” which originates from the process of cell lysis. We will check for that and correct using SoupX package.
The top 20 genes with highest expression in background (useful to pick “soup”-specific genes). (These are often enriched for ribosomal proteins).
Automatically estimated ambient RNA total contamination rates
The top 15 genes set to zero in some fraction of cells after SoupX correction.
We observe that certain genes, which have a high level of expression in the ‘soup,’ had their expression levels either reduced to zero or decreased by one or more orders of magnitude.
Conversely, the impact of SoupX correction on certain marker genes for the main cell types is not as dramatic.
There might be different “bad cells” (barcodes): empty drops, fragments of cells, dead cells, etc.
There are several parameters we can explore for cells (barcodes) quality assessment.
Number of detected genes (NDG,
nFeature_RNA): in the original papers, limited between 200
and 4000 genes (Alsaigh, Pan), or between 500 and 3500 (Wirka). Let’s
start with 200 detected genes as minimum, and 4500 as maximum.
Number of gene counts (Number of UMIs) (NUMI,
nCount_RNA): usually limited between 500 and 50000 counts.
This cutoff mentioned only in Pan et al. paper, number of UMIs is up to
20’000. We will use only lower cutoff of 500 UMIs and set the upper
threshold later not to overestimate a doublet rate further.
Mitochondrial gene counts ratio (MTP,
percent_mito): usually must be less than 10%. Ppers by Pan
et al and Alsaigh et al use a 10% cutoff. In Wirka et al study it is
more stringent (< 7.5%). Let’s first use more relaxed 10% since we
define an additional cutoff of MRR <0.5.
Hemoglobin gene counts ratio (HBP,
percent_hb): must be less than 1% which helps to dispose of
red blood cell contamination. There is almost no cells in analysed
libraries with noticeable levels of HB genes.
Complexity: novelty score (NS,
novelty_score): recommended to be more than 0.8.
How many cells pass these criteria?
Let’s take a look at QC metrics after SoupX correction and estimate how many cells pass defined criteria.
| sample_id | N_Cells | ALL | Passed NDG cutoff | Passed NUMI cutoff | Passed NS cutoff | Passed MTP cutoff | Passed MRR cutoff | Passed HBP cutoff | Passed all cutoffs |
|---|---|---|---|---|---|---|---|---|---|
| A.GSM4837523 | 11047 | 10081 | 10971 (99.3%) | 11034 (99.9%) | 10962 (99.2%) | 10350 (93.7%) | 10232 (92.6%) | 11046 (100%) | 10081 (91.3%) |
| A.GSM4837524 | 3765 | 3069 | 3667 (97.4%) | 3760 (99.9%) | 3685 (97.9%) | 3164 (84%) | 3111 (82.6%) | 3765 (100%) | 3069 (81.5%) |
| A.GSM4837525 | 16040 | 13613 | 15961 (99.5%) | 16021 (99.9%) | 15522 (96.8%) | 14383 (89.7%) | 14410 (89.8%) | 16040 (100%) | 13613 (84.9%) |
| A.GSM4837526 | 5590 | 4725 | 5558 (99.4%) | 5588 (100%) | 5432 (97.2%) | 5021 (89.8%) | 4926 (88.1%) | 5590 (100%) | 4725 (84.5%) |
| A.GSM4837527 | 12531 | 10515 | 12335 (98.4%) | 12522 (99.9%) | 12468 (99.5%) | 10944 (87.3%) | 10887 (86.9%) | 12530 (100%) | 10515 (83.9%) |
| A.GSM4837528 | 3404 | 2757 | 3352 (98.5%) | 3399 (99.9%) | 3373 (99.1%) | 2921 (85.8%) | 2801 (82.3%) | 3404 (100%) | 2757 (81%) |
| P.GSM4705589 | 3441 | 2702 | 3144 (91.4%) | 3437 (99.9%) | 3286 (95.5%) | 2954 (85.8%) | 2925 (85%) | 3441 (100%) | 2702 (78.5%) |
| P.GSM4705590 | 4692 | 3617 | 4319 (92.1%) | 4689 (99.9%) | 4407 (93.9%) | 3934 (83.8%) | 4025 (85.8%) | 4692 (100%) | 3617 (77.1%) |
| P.GSM4705591 | 3138 | 2768 | 3103 (98.9%) | 3134 (99.9%) | 3127 (99.6%) | 2873 (91.6%) | 2809 (89.5%) | 3138 (100%) | 2768 (88.2%) |
| W.GSM3819856 | 1828 | 1528 | 1825 (99.8%) | 1822 (99.7%) | 1580 (86.4%) | 1792 (98%) | 1778 (97.3%) | 1828 (100%) | 1528 (83.6%) |
| W.GSM3819857 | 731 | 594 | 731 (100%) | 727 (99.5%) | 624 (85.4%) | 716 (97.9%) | 706 (96.6%) | 731 (100%) | 594 (81.3%) |
| W.GSM3819858 | 2225 | 2007 | 2221 (99.8%) | 2210 (99.3%) | 2225 (100%) | 2049 (92.1%) | 2044 (91.9%) | 2225 (100%) | 2007 (90.2%) |
| W.GSM3819859 | 1975 | 1789 | 1974 (99.9%) | 1959 (99.2%) | 1975 (100%) | 1838 (93.1%) | 1829 (92.6%) | 1975 (100%) | 1789 (90.6%) |
| W.GSM3819860 | 3012 | 2872 | 3008 (99.9%) | 3005 (99.8%) | 2938 (97.5%) | 2959 (98.2%) | 2963 (98.4%) | 3012 (100%) | 2872 (95.4%) |
| W.GSM3819861 | 3192 | 3026 | 3180 (99.6%) | 3178 (99.6%) | 3130 (98.1%) | 3108 (97.4%) | 3124 (97.9%) | 3192 (100%) | 3026 (94.8%) |
| W.GSM3819862 | 2907 | 2754 | 2904 (99.9%) | 2896 (99.6%) | 2846 (97.9%) | 2836 (97.6%) | 2839 (97.7%) | 2906 (100%) | 2754 (94.7%) |
| W.GSM3819863 | 2547 | 2408 | 2541 (99.8%) | 2543 (99.8%) | 2518 (98.9%) | 2472 (97.1%) | 2450 (96.2%) | 2546 (100%) | 2408 (94.5%) |
Doublets/multiplets are defined as two or more cells that are sequenced under the same cellular barcodec. They can be formed from the same (homotypic) or different (heterotypic) cell types. Their identification is crucial as they are most likely misclassified and can lead to distorted downstream analysis steps.
To detect putative doublets, will use three R packages: - DoubletFinder - scDblFinder. - Scrublet, adopted for run from R here
Analysed dataets are obtained after scRNA-seq of cell suspension pre-sorted by cell size and viability. They must have contain rather low percent of droplets with doublet/multiplet cells. Doublet rate about 3% is a maximum remarked in related publications (in Alsaigh et al paper). We set this rate higher, up to 8%, to certainly filter out putative doublets.
Let’s do initial merging of sample datasets to visually compare the efficiency of doublet detection methods and QC.
QC metrics of initially merged dataset:
Consensus between different doublet detection methods - how many cells considered as doublets are overlapped?
DoubletFinder Doublet Singlet
Scrublet scDblFinder
Doublet Doublet 395 779
Singlet 13 102
Singlet Doublet 487 3042
Singlet 4064 61869
Since we have enough cells, let’s discard all barcodes considered as doublet by any method.
Additionally, we employ a filtering criterion based on the number of reads per gene (nCount_RNA should be less than 15000) to mitigate the potential presence of putative doublets in the dataset. Additionally, we will filter out all CD45+ cells located within non-leikosyte superclusters (endothelial and smooth muscle cells).
Number of cells with gene counts outside the thresholds (nCount_RNA
> th_nch) in total and by library (sample):
Total number of cells with gene counts > 15000
FALSE TRUE
67336 3415
Number of cells with gene counts > 15000 by sample
FALSE TRUE
A.GSM4837523 9208 871
A.GSM4837524 2957 111
A.GSM4837525 13173 432
A.GSM4837526 4539 185
A.GSM4837527 9785 725
A.GSM4837528 2575 179
P.GSM4705589 2416 285
P.GSM4705590 3211 405
P.GSM4705591 2655 109
W.GSM3819856 1518 4
W.GSM3819857 591 0
W.GSM3819858 1997 3
W.GSM3819859 1774 5
W.GSM3819860 2844 23
W.GSM3819861 3003 14
W.GSM3819862 2738 10
W.GSM3819863 2352 54
To have more cues about cell types in detected clusters in initially merged dataset, we will use a pre-defined list of markers for anticipated cell types:
We suggest the following main cell types
| Cell type | Cluster |
|---|---|
| CD45+ cells (leukocytes) | 0,1,2,6,8,9,14,19,20,21,24 |
| Endothelial cells | 4,10,11,23 |
| Smooth muscle cells + Fibroblasts | 3,5,7,12,13,15,16,17,18,22 |
Detected CD45+ cells in non-leukocytes clusters:
Number of CD45+ cells in endothelial clusters (EC):
CD45-EC CD45+EC
9271 392
Number of CD45+ cells in smooth muscle clusters (SMC):
CD45-SMC CD45+SMC
20835 739
The number of detected genes in these cells
The number of detected genes in these suspicious cell groups is higher which may indicate the higher probability these “cells” are doublets that are not detected by either tool used for this purpose.
Thus, if assuming that transition of VSMC and EC to CD45+ myeloid cell is not possible (but we cannot say the same about the reverse transition), we may estimate the observed rate of heterotypic leukocyte-derived doublets is about 3-4%.
Let’s estimate the number of cells passing all filters.
Total number of cells passed after doublet detection and 'gene counts' cutoff:
Discarded Passed
11255 59496
Number of cells in every samples passed after doublet detection and 'gene counts' cutoff:
Discarded Passed
A.GSM4837523 2008 8071
A.GSM4837524 372 2696
A.GSM4837525 2302 11303
A.GSM4837526 625 4099
A.GSM4837527 1986 8524
A.GSM4837528 443 2311
P.GSM4705589 502 2199
P.GSM4705590 733 2883
P.GSM4705591 386 2378
W.GSM3819856 165 1357
W.GSM3819857 68 523
W.GSM3819858 275 1725
W.GSM3819859 205 1574
W.GSM3819860 309 2558
W.GSM3819861 310 2707
W.GSM3819862 283 2465
W.GSM3819863 283 2123
Since the priority of this study is plaque cells, we will also remove the samples of proximal adjacent tissue (PA samples: GSM4837524, GSM4837526, GSM4837528) from Alsaigh et al. dataset which seem to have some fraction of adventitial cells.
Let’s proceed to normalize the remaining samples utilizing standard LogNorm workflow in Seurat and then integrate them using Harmony.
Cell clusters:
QC Metrics
Cell clusters dendrogram:
Cell clusters by source:
First look how are expressed some pre-selected marker genes specific for atherosclerotic arteries
Top markers (up-regulated genes) for cell clusters
The expression of cluster-specific marker genes (top 10 for each cluster) identified by MAST method.
Top 5 marker genes:
Automatic cell type annotation using
| C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | C23 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Astrocyte | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 1 | 0 | 0 | 0 |
| B_cell | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2271 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 6 | 0 | 1 | 1 | 0 | 18 |
| Chondrocytes | 0 | 0 | 2104 | 3 | 0 | 1415 | 4 | 0 | 2 | 0 | 0 | 156 | 974 | 136 | 0 | 472 | 0 | 0 | 28 | 22 | 0 | 93 | 2 | 0 |
| CMP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 627 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| DC | 0 | 0 | 1 | 4 | 177 | 1 | 0 | 132 | 1 | 0 | 7 | 1 | 0 | 0 | 3 | 0 | 33 | 66 | 0 | 0 | 2 | 0 | 0 | 4 |
| Endothelial_cells | 0 | 0 | 8 | 4292 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 3 | 2 | 257 | 0 |
| Epithelial_cells | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Fibroblasts | 0 | 0 | 96 | 3 | 0 | 208 | 0 | 0 | 0 | 0 | 0 | 54 | 28 | 118 | 1 | 36 | 0 | 0 | 26 | 5 | 1 | 21 | 0 | 0 |
| GMP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 41 | 0 | 0 | 0 | 1 | 0 | 4 | 0 | 0 | 12 |
| Hepatocytes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
| HSC_-G-CSF | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| HSC_CD34+ | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
| Macrophage | 0 | 0 | 0 | 1 | 2654 | 2 | 1 | 1450 | 0 | 0 | 35 | 0 | 0 | 0 | 16 | 0 | 376 | 24 | 3 | 0 | 55 | 0 | 1 | 0 |
| Monocyte | 0 | 0 | 0 | 0 | 1385 | 0 | 0 | 919 | 3 | 0 | 1937 | 0 | 0 | 0 | 69 | 0 | 336 | 391 | 1 | 0 | 33 | 0 | 0 | 34 |
| MSC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Neurons | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 288 | 1 | 1 | 0 | 0 |
| Neutrophils | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 34 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| NK_cell | 324 | 18 | 0 | 0 | 0 | 0 | 23 | 3 | 6 | 1537 | 0 | 0 | 0 | 0 | 96 | 0 | 0 | 0 | 15 | 0 | 33 | 0 | 0 | 1 |
| Osteoblasts | 0 | 0 | 18 | 0 | 0 | 15 | 0 | 0 | 0 | 0 | 0 | 3 | 16 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 |
| Pre-B_cell_CD34- | 4 | 3 | 0 | 1 | 0 | 0 | 10 | 5 | 9 | 1 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 42 |
| Pro-B_cell_CD34+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
| Pro-Myelocyte | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Smooth_muscle_cells | 0 | 0 | 791 | 13 | 0 | 1677 | 0 | 0 | 0 | 0 | 0 | 651 | 628 | 624 | 0 | 222 | 0 | 0 | 42 | 35 | 0 | 133 | 5 | 0 |
| T_cells | 5845 | 4841 | 0 | 1 | 0 | 1 | 2930 | 5 | 6 | 562 | 1 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 282 | 0 | 280 | 2 | 1 | 0 |
| Tissue_stem_cells | 0 | 0 | 1652 | 20 | 0 | 822 | 6 | 0 | 2 | 0 | 0 | 1080 | 303 | 238 | 1 | 110 | 0 | 0 | 71 | 78 | 0 | 90 | 5 | 0 |
| C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | C23 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B cell | 90 | 90 | 0 | 1 | 0 | 1 | 242 | 5 | 1420 | 6 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 4 | 41 | 0 | 6 | 0 | 0 | 63 |
| endothelial cell | 0 | 0 | 0 | 3835 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 146 | 0 |
| endothelial cell of artery | 0 | 0 | 0 | 224 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 0 |
| fibroblast | 0 | 0 | 1 | 4 | 0 | 1470 | 1 | 0 | 0 | 0 | 0 | 6 | 117 | 1051 | 0 | 37 | 0 | 0 | 112 | 32 | 0 | 7 | 1 | 0 |
| macrophage | 81 | 24 | 3 | 62 | 4215 | 4 | 156 | 2506 | 579 | 9 | 2013 | 11 | 0 | 0 | 100 | 0 | 745 | 478 | 84 | 9 | 107 | 29 | 38 | 18 |
| mast cell | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 790 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| mature NK T cell | 103 | 6 | 0 | 0 | 0 | 0 | 8 | 0 | 1 | 1406 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 25 | 0 | 0 | 0 |
| pericyte | 0 | 0 | 21 | 3 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 900 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 16 | 0 | 0 | 0 | 0 |
| plasma cell | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 4 | 309 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 2 | 2 | 1 | 30 |
| smooth muscle cell | 1 | 0 | 4644 | 213 | 0 | 2656 | 12 | 0 | 1 | 0 | 0 | 1035 | 1831 | 65 | 2 | 803 | 0 | 0 | 44 | 397 | 3 | 308 | 76 | 0 |
| T cell | 5898 | 4742 | 0 | 1 | 0 | 1 | 2558 | 2 | 1 | 679 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 195 | 0 | 271 | 2 | 1 | 0 |
Main immune cell type gene signatures from Zernecke A. et al, 2023
Some gene markers of lipid uptake and foam cells
Clusters C7 (macrophages) and C11 (SMCs) have APOE, CD36, FABP4, FABP5 highly co-expressed which are related to lipid uptake and storage, and may point to a possible “fomay” state of these cells.
Components of complement system
Some markers of osteoblasts
Cell numbers in manually annotated cell types and subtypes
| T cell | Smooth muscle cell | Endothelial cell | Macrophage | B cell | Monocyte | Fibroblast | Mast cell | Dendritic cell | Fibroblast/Mixed | Neuron | Proliferating immune cells | Plasma cells | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A.GSM4837523 | 2839 | 1023 | 273 | 2726 | 35 | 804 | 1 | 64 | 108 | 73 | 0 | 110 | 15 |
| A.GSM4837525 | 6263 | 1430 | 1182 | 695 | 1047 | 218 | 49 | 135 | 38 | 148 | 0 | 80 | 18 |
| A.GSM4837527 | 3990 | 2419 | 76 | 796 | 129 | 509 | 15 | 231 | 95 | 89 | 0 | 122 | 53 |
| P.GSM4705589 | 26 | 1908 | 68 | 103 | 0 | 13 | 6 | 61 | 3 | 2 | 0 | 8 | 1 |
| P.GSM4705590 | 192 | 995 | 588 | 713 | 26 | 137 | 7 | 115 | 82 | 2 | 0 | 17 | 9 |
| P.GSM4705591 | 392 | 903 | 265 | 578 | 12 | 76 | 15 | 46 | 68 | 6 | 1 | 11 | 5 |
| W.GSM3819856 | 462 | 200 | 87 | 202 | 297 | 19 | 38 | 6 | 10 | 12 | 10 | 14 | 0 |
| W.GSM3819857 | 95 | 122 | 127 | 55 | 73 | 9 | 25 | 3 | 4 | 7 | 2 | 0 | 1 |
| W.GSM3819858 | 296 | 787 | 178 | 30 | 7 | 21 | 192 | 3 | 1 | 35 | 174 | 1 | 0 |
| W.GSM3819859 | 285 | 743 | 61 | 81 | 9 | 31 | 174 | 1 | 2 | 22 | 162 | 3 | 0 |
| W.GSM3819860 | 396 | 755 | 529 | 343 | 198 | 42 | 186 | 21 | 13 | 24 | 36 | 15 | 0 |
| W.GSM3819861 | 460 | 786 | 584 | 288 | 261 | 51 | 170 | 14 | 13 | 26 | 42 | 12 | 0 |
| W.GSM3819862 | 397 | 705 | 518 | 328 | 215 | 42 | 177 | 13 | 14 | 21 | 25 | 10 | 0 |
| W.GSM3819863 | 19 | 1126 | 80 | 540 | 2 | 43 | 61 | 182 | 31 | 13 | 3 | 14 | 9 |
| T cell (Cytotoxic) | T cell (Activated) | SMC (Contractile) | Endothelial cell (C4) | Macrophage (Inflammatory) | SMC (Fibromyocyte C5) | T cell (Memory) | Macrophage (Foamy) | B cell | T cell (T/NK cell) | Monocyte | SMC (Foamy) | SMC (Fibromyocyte C12) | Fibroblast | Mast cell | SMC (Osteochondrogenic) | Macrophage (C16) | Dendritic cell | Fibroblast/Mixed (C18) | Neuron | Proliferating immune cell | SMC (Undefined C21) | Endothelial cell (C22) | Plasma cell | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A.GSM4837523 | 941 | 903 | 386 | 195 | 401 | 214 | 438 | 1605 | 35 | 557 | 804 | 28 | 78 | 1 | 64 | 299 | 720 | 108 | 73 | 0 | 110 | 18 | 78 | 15 |
| A.GSM4837525 | 2750 | 1834 | 208 | 1177 | 551 | 573 | 1131 | 138 | 1047 | 548 | 218 | 361 | 176 | 49 | 135 | 68 | 6 | 38 | 148 | 0 | 80 | 44 | 5 | 18 |
| A.GSM4837527 | 1283 | 1196 | 1115 | 59 | 614 | 731 | 1047 | 172 | 129 | 464 | 509 | 113 | 323 | 15 | 231 | 26 | 10 | 95 | 89 | 0 | 122 | 111 | 17 | 53 |
| P.GSM4705589 | 11 | 12 | 961 | 39 | 92 | 489 | 2 | 11 | 0 | 1 | 13 | 116 | 304 | 6 | 61 | 24 | 0 | 3 | 2 | 0 | 8 | 14 | 29 | 1 |
| P.GSM4705590 | 106 | 53 | 288 | 509 | 609 | 250 | 18 | 104 | 26 | 15 | 137 | 120 | 124 | 7 | 115 | 197 | 0 | 82 | 2 | 0 | 17 | 16 | 79 | 9 |
| P.GSM4705591 | 157 | 129 | 274 | 238 | 485 | 218 | 54 | 89 | 12 | 52 | 76 | 83 | 139 | 15 | 46 | 169 | 4 | 68 | 6 | 1 | 11 | 20 | 27 | 5 |
| W.GSM3819856 | 160 | 140 | 43 | 87 | 151 | 58 | 47 | 51 | 297 | 115 | 19 | 56 | 31 | 38 | 6 | 2 | 0 | 10 | 12 | 10 | 14 | 10 | 0 | 0 |
| W.GSM3819857 | 32 | 28 | 25 | 120 | 45 | 50 | 5 | 10 | 73 | 30 | 9 | 31 | 11 | 25 | 3 | 0 | 0 | 4 | 7 | 2 | 0 | 5 | 7 | 1 |
| W.GSM3819858 | 84 | 106 | 199 | 177 | 18 | 307 | 39 | 12 | 7 | 67 | 21 | 148 | 103 | 192 | 3 | 11 | 0 | 1 | 35 | 174 | 1 | 19 | 1 | 0 |
| W.GSM3819859 | 92 | 96 | 184 | 58 | 67 | 295 | 33 | 14 | 9 | 64 | 31 | 161 | 84 | 174 | 1 | 8 | 0 | 2 | 22 | 162 | 3 | 11 | 3 | 0 |
| W.GSM3819860 | 179 | 108 | 142 | 528 | 278 | 274 | 53 | 63 | 198 | 56 | 42 | 195 | 120 | 186 | 21 | 6 | 2 | 13 | 24 | 36 | 15 | 18 | 1 | 0 |
| W.GSM3819861 | 189 | 144 | 185 | 582 | 227 | 256 | 55 | 60 | 261 | 72 | 51 | 222 | 98 | 170 | 14 | 8 | 1 | 13 | 26 | 42 | 12 | 17 | 2 | 0 |
| W.GSM3819862 | 182 | 108 | 139 | 514 | 265 | 226 | 50 | 62 | 215 | 57 | 42 | 219 | 104 | 177 | 13 | 8 | 1 | 14 | 21 | 25 | 10 | 9 | 4 | 0 |
| W.GSM3819863 | 7 | 5 | 521 | 60 | 413 | 200 | 5 | 126 | 2 | 2 | 43 | 100 | 254 | 61 | 182 | 15 | 1 | 31 | 13 | 3 | 14 | 36 | 20 | 9 |
How are THOR genes expressed in cell clusters?
This workflow is inspired by several sources:
“Best practices for single-cell analysis across modalities” paper by Heumos et al (2023)
“Single-cell RNA-seq: Quality Control Analysis”, teaching materials at the Harvard Chan Bioinformatics Core
Quality Control chapter in “Single cell best practices” handbook
Single-cell data analysis pipeline elaborated by CellGenIT group
sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS/LAPACK: /home/amarkov/miniconda3/envs/r4.2/lib/libopenblasp-r0.3.24.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] harmony_1.0.1 Rcpp_1.0.9 UCell_2.2.0 DT_0.27
[5] khroma_1.9.0 MetBrewer_0.2.0 RColorBrewer_1.1-3 gdata_2.19.0
[9] Matrix_1.6-1.1 janitor_2.2.0 lubridate_1.9.3 forcats_1.0.0
[13] stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2 readr_2.1.4
[17] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.3 tidyverse_2.0.0
[21] patchwork_1.1.3 gridExtra_2.3 scrubletR_0.1.0 scDblFinder_1.10.0
[25] DoubletFinder_2.0.3 SoupX_1.6.2 scCustomize_1.1.1 Seurat_4.9.9.9067
[29] SeuratObject_4.9.9.9091 sp_2.1-0 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
[33] Biobase_2.58.0 GenomicRanges_1.50.0 GenomeInfoDb_1.34.9 IRanges_2.32.0
[37] S4Vectors_0.36.0 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_1.0.0
loaded via a namespace (and not attached):
[1] ggprism_1.0.4 rtracklayer_1.58.0 scattermore_1.2 ragg_1.2.5
[5] bit64_4.0.5 knitr_1.44 irlba_2.3.5.1 DelayedArray_0.24.0
[9] data.table_1.14.8 RCurl_1.98-1.12 doParallel_1.0.17 generics_0.1.3
[13] ScaledMatrix_1.6.0 RhpcBLASctl_0.23-42 cowplot_1.1.1 RANN_2.6.1
[17] future_1.33.0 bit_4.0.5 tzdb_0.4.0 spatstat.data_3.0-1
[21] httpuv_1.6.11 viridis_0.6.4 xfun_0.40 jquerylib_0.1.4
[25] hms_1.1.3 evaluate_0.22 promises_1.2.1 progress_1.2.2
[29] fansi_1.0.5 restfulr_0.0.15 readxl_1.4.3 DBI_1.1.3
[33] igraph_1.4.2 htmlwidgets_1.6.2 spatstat.geom_3.2-5 paletteer_1.5.0
[37] ellipsis_0.3.2 crosstalk_1.2.0 RSpectra_0.16-1 prismatic_1.1.1
[41] deldir_1.0-9 sparseMatrixStats_1.10.0 vctrs_0.6.3 ROCR_1.0-11
[45] abind_1.4-5 cachem_1.0.8 withr_2.5.1 progressr_0.14.0
[49] vroom_1.6.4 presto_1.0.0 sctransform_0.4.0 GenomicAlignments_1.34.0
[53] prettyunits_1.2.0 scran_1.24.1 goftest_1.2-3 cluster_2.1.4
[57] ape_5.7-1 dotCall64_1.0-2 lazyeval_0.2.2 crayon_1.5.2
[61] spatstat.explore_3.2-3 labeling_0.4.3 edgeR_3.38.4 pkgconfig_2.0.3
[65] nlme_3.1-163 vipor_0.4.5 rlang_1.1.1 globals_0.16.2
[69] lifecycle_1.0.3 miniUI_0.1.1.1 fastDummies_1.6.3 rsvd_1.0.5
[73] cellranger_1.1.0 ggrastr_1.0.1 polyclip_1.10-6 RcppHNSW_0.5.0
[77] lmtest_0.9-40 zoo_1.8-12 beeswarm_0.4.0 ggridges_0.5.4
[81] GlobalOptions_0.1.2 png_0.1-8 viridisLite_0.4.2 rjson_0.2.21
[85] bitops_1.0-7 KernSmooth_2.23-22 spam_2.9-1 Biostrings_2.66.0
[89] DelayedMatrixStats_1.20.0 shape_1.4.6 parallelly_1.36.0 spatstat.random_3.1-6
[93] beachmat_2.14.0 scales_1.2.1 magrittr_2.0.3 plyr_1.8.9
[97] ica_1.0-3 zlibbioc_1.44.0 compiler_4.2.3 dqrng_0.3.1
[101] BiocIO_1.8.0 clue_0.3-64 fitdistrplus_1.1-11 Rsamtools_2.14.0
[105] snakecase_0.11.0 cli_3.6.1 XVector_0.38.0 listenv_0.9.0
[109] pbapply_1.7-2 MASS_7.3-60 tidyselect_1.2.0 MAST_1.22.0
[113] stringi_1.7.12 BPCells_0.1.0 textshaping_0.3.6 yaml_2.3.7
[117] BiocSingular_1.14.0 locfit_1.5-9.7 ggrepel_0.9.3 grid_4.2.3
[121] sass_0.4.7 tools_4.2.3 timechange_0.2.0 future.apply_1.11.0
[125] parallel_4.2.3 circlize_0.4.15 rstudioapi_0.15.0 bluster_1.6.0
[129] foreach_1.5.2 metapod_1.4.0 farver_2.1.1 Rtsne_0.16
[133] digest_0.6.33 shiny_1.7.5 scuttle_1.6.3 later_1.3.1
[137] writexl_1.4.2 RcppAnnoy_0.0.21 httr_1.4.7 ComplexHeatmap_2.12.1
[141] colorspace_2.1-0 XML_3.99-0.14 tensor_1.5 reticulate_1.32.0
[145] splines_4.2.3 uwot_0.1.16 statmod_1.5.0 rematch2_2.1.2
[149] spatstat.utils_3.0-3 scater_1.24.0 xgboost_1.7.5.1 systemfonts_1.0.5
[153] plotly_4.10.2 xtable_1.8-4 jsonlite_1.8.7 R6_2.5.1
[157] pillar_1.9.0 htmltools_0.5.6.1 mime_0.12 glue_1.6.2
[161] fastmap_1.1.1 BiocParallel_1.32.5 BiocNeighbors_1.16.0 codetools_0.2-19
[165] utf8_1.2.3 bslib_0.5.1 lattice_0.21-9 spatstat.sparse_3.0-2
[169] ggbeeswarm_0.7.1 leiden_0.4.3 gtools_3.9.4 survival_3.5-7
[173] limma_3.54.0 rmarkdown_2.25 munsell_0.5.0 GetoptLong_1.0.5
[177] GenomeInfoDbData_1.2.9 iterators_1.0.14 reshape2_1.4.4 gtable_0.3.4